Traffic Violation In Maryland

Henry Phan, Jason Lim


In [57]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns
import folium
from folium.plugins import HeatMap
from folium.plugins import HeatMapWithTime
import datetime
import statsmodels.formula.api as sm 
In [58]:
original = pd.read_excel('traffic_sample.xlsx')
In [59]:
# Converting the Latitude and Longitude Attributes to a Float
original["Latitude"] = original["Latitude"].astype(float)
original["Longitude"] = original["Longitude"].astype(float)

original = original[original["Gender"] != "U"]
original = original[(original["Year"] != 0) & (original["Year"] < 2020) & (original["Year"] > 1900)]

original.head(n = 25)
Out[59]:
Unnamed: 0 Date Of Stop Time Of Stop Agency SubAgency Description Location Latitude Longitude Accident ... Charge Article Contributed To Accident Race Gender Driver City Driver State DL State Arrest Type Geolocation
0 415031 2016-05-01 23:08:00 MCP 4th district, Wheaton DRIVER FAIL TO STOP AT RED TRAFFIC SIGNAL BEFO... GEORGIA AVE AT GLENALLAN AVE 39.063522 -77.055263 No ... 21-202(i1) Transportation Article No HISPANIC M SILVER SPRING MD MD A - Marked Patrol (39.0635216666667, -77.0552633333333)
1 261574 2017-11-30 01:14:00 MCP 3rd district, Silver Spring DRIVING VEHICLE IN EXCESS OF REASONABLE AND PR... RANDOLPH ROAD AT TAMARACK RD 39.067270 -76.984982 No ... 21-801(a) Transportation Article No BLACK M SILVER SPRING MD MD A - Marked Patrol (39.06727, -76.9849816666667)
2 523346 2012-05-24 10:48:00 MCP 3rd district, Silver Spring DRIVER USING HANDS TO USE HANDHELD TELEPHONE W... WAYNE AVE / COLESVILLE RD, W/B 38.995165 -77.031199 No ... 21-1124.2(d2) Transportation Article No HISPANIC F SILVER SPRING MD MD L - Motorcycle (38.9951653333333, -77.0311989333333)
3 213661 2012-10-18 03:30:00 MCP 3rd district, Silver Spring DRIVING TO DRIVE MOTOR VEHICLE ON HIGHWAY WITH... COLUMBIA PIKE AT LORRAIN AVE 38.998501 -77.026377 No ... 16-101(a) Transportation Article No BLACK M CHESAPEAKE WV WV A - Marked Patrol (38.9985010833333, -77.02637735)
4 330686 2012-02-16 07:47:00 MCP 6th district, Gaithersburg / Montgomery Village EXCEEDING MAXIMUM SPEED: 34 MPH IN A POSTED 25... GAME PRESERVE RD N/B (11400 BLOCK) 39.157287 -77.239419 No ... 21-801.1 Transportation Article No HISPANIC F GERMANTOWN MD MD Q - Marked Laser (39.1572871333333, -77.23941855)
5 567779 2015-09-18 14:19:00 MCP 5th district, Germantown DRIVER FAILURE TO OBEY PROPERLY PLACED TRAFFIC... OBSERVATION DR AND SENECA MEADOWS PARKWA 39.198632 -77.253483 No ... 21-201(a1) Transportation Article No OTHER M GAITHERSBURG MD MD A - Marked Patrol (39.1986316666667, -77.2534833333333)
6 767243 2014-09-05 01:26:00 MCP 3rd district, Silver Spring DRIVER FAILURE TO OBEY PROPERLY PLACED TRAFFIC... COLUMBIA PIKE @ FAIRLAND RD 39.070163 -76.952160 No ... 21-201(a1) Transportation Article No WHITE M BETHESDA MD MD A - Marked Patrol (39.0701633333333, -76.95216)
7 391162 2016-11-14 08:08:00 MCP 3rd district, Silver Spring DRIVING VEHICLE ON HIGHWAY WITHOUT CURRENT REG... TAMARACK RD / E RANDOLPH RD 39.046277 -76.990695 No ... 13-411(d) Transportation Article No BLACK M SILVER SPRING MD MD A - Marked Patrol (39.0462766666667, -76.990695)
8 399408 2016-05-07 21:17:00 MCP 3rd district, Silver Spring FAILURE TO CONTROL VEHICLE SPEED ON HIGHWAY TO... NB NEW HAMPSHIRE AVE @ OAKVIEW DR 39.014340 -77.034123 No ... 21-801(b) Transportation Article Yes WHITE M OLNEY MD MD A - Marked Patrol (39.01434, -77.0341233333333)
9 595871 2018-02-06 17:31:00 MCP 1st district, Rockville FAILURE OF LICENSEE TO NOTIFY ADMINISTRATION O... I-270 PRIOR TO MONTROSE 39.035290 -77.143953 No ... 16-116(a) Transportation Article No BLACK F GERMANTOWN MD MD A - Marked Patrol (39.03529, -77.1439533333333)
10 854814 2015-10-20 17:10:00 MCP 5th district, Germantown DRIVER FAILURE TO OBEY PROPERLY PLACED TRAFFIC... OBSERVATION DR @ SHAKESPEARE BLVD 39.201037 -77.252962 No ... 21-201(a1) Transportation Article No OTHER F ADAMSTOWN MD MD A - Marked Patrol (39.2010366666667, -77.2529616666667)
11 29623 2015-12-26 14:11:00 MCP 5th district, Germantown DRIVER FAILURE TO STOP AT INTERSECTION HWY. ST... WISTERIA DR/GERMANTOWN RD 39.177412 -77.271238 No ... 21-403(c) Transportation Article No ASIAN F GAITHERSBURG MD MD A - Marked Patrol (39.1774116666667, -77.2712383333333)
12 786363 2014-12-11 04:58:00 MCP 4th district, Wheaton DRIVING VEHICLE IN EXCESS OF REASONABLE AND PR... EB BROOKEVILLE ROAD/ZION ROAD 39.184595 -77.089718 No ... 21-801(a) Transportation Article No HISPANIC M MONTGOMERY VILLAGE MD MD A - Marked Patrol (39.184595, -77.0897183333333)
13 494721 2015-09-23 14:18:00 MCP 1st district, Rockville EXCEEDING MAXIMUM SPEED: 49 MPH IN A POSTED 40... SB SHADY GROVE RD AT SILVER BELL TER 39.087512 -77.211753 No ... 21-801.1 Transportation Article No BLACK F POTOMAC MD MD Q - Marked Laser (39.0875116666667, -77.2117533333333)
14 19786 2016-03-12 08:02:00 MCP 4th district, Wheaton FAILURE TO DISPLAY REGISTRATION CARD UPON DEMA... CONNECTICUT AVE/ RANDOLPH RD 39.056553 -77.073752 No ... 13-409(b) Transportation Article No OTHER M OLNEY MD MD A - Marked Patrol (39.0565533333333, -77.0737516666667)
15 828227 2015-04-27 15:10:00 MCP 4th district, Wheaton DRIVER FAILURE TO STOP AT STOP SIGN LINE ENNALLS AVE @ GRANDVIEW AVE 39.041407 -77.053135 No ... 21-707(a) Transportation Article No BLACK F GAITHERSBURG MD MD A - Marked Patrol (39.0414066666667, -77.053135)
16 746339 2015-12-21 17:11:00 MCP 2nd district, Bethesda DRIVER FAILURE TO OBEY PROPERLY PLACED TRAFFIC... ARLINGTON RD/ ELM 38.982033 -77.100048 No ... 21-201(a1) Transportation Article No WHITE F BETHESDA MD MD A - Marked Patrol (38.9820333333333, -77.1000483333333)
17 1107454 2014-03-18 19:07:00 MCP 1st district, Rockville DRIVING VEH. ON HWY. WITH UNPAID REGISTRATION FEE W MONTGOMERY AVE AT NELSON ST 39.085607 -77.171098 No ... 13-401(d) Transportation Article No WHITE M BETHESDA MD MD A - Marked Patrol (39.0856066666667, -77.1710983333333)
18 986737 2017-05-17 10:43:00 MCP 1st district, Rockville EXCEEDING POSTED MAXIMUM SPEED LIMIT: 39 MPH I... N/B TRAVILAH RD @ BRUSHWOOD TERR 39.064230 -77.270002 No ... 21-801.1 Transportation Article No ASIAN F POTOMAC MD MD Q - Marked Laser (39.06423, -77.2700016666667)
19 527160 2018-01-07 11:30:00 MCP 5th district, Germantown EXCEEDING THE POSTED SPEED LIMIT OF 40 MPH FREDERICK ROAD/WHEATFIELD DR 39.167680 -77.229683 No ... 21-801.1 Transportation Article No BLACK M GAITHERSBURG MD MD A - Marked Patrol (39.16768, -77.2296833333333)
20 778979 2015-09-12 00:02:00 MCP 3rd district, Silver Spring FAILURE OF LICENSEE TO NOTIFY ADMINISTRATION O... BRIGGS CHANEY DR / CASTLE BLVD 39.078653 -76.944692 No ... 16-116(a) Transportation Article No BLACK F BELTSVILLE MD MD A - Marked Patrol (39.0786533333333, -76.9446916666667)
21 380873 2016-10-27 19:47:00 MCP 4th district, Wheaton PERSON DRIVING MOTOR VEHICLE ON HIGHWAY OR PUB... MONTGOMERY VILLAGE AVE / LOST KNIFE RD 39.157830 -77.204707 No ... 16-303(d) Transportation Article No BLACK M MONTGOMERY VILLAGE MD VA A - Marked Patrol (39.15783, -77.2047066666667)
22 713048 2018-04-16 18:37:00 MCP 4th district, Wheaton DRIVER FAILURE TO OBEY PROPERLY PLACED TRAFFIC... NORBECK ROAD AND LLEWELLYN MANOR WAY 39.118427 -77.027362 No ... 21-201(a1) Transportation Article No WHITE M SILVER SPRING MD MD A - Marked Patrol (39.1184266666667, -77.0273616666667)
23 180280 2015-11-06 08:43:00 MCP 4th district, Wheaton EXCEEDING MAXIMUM SPEED: 54 MPH IN A POSTED 45... NORBECK RD AT NORWOOD RD 39.118655 -77.022628 No ... 21-801.1 Transportation Article No HISPANIC M ROCKVILLE MD MD A - Marked Patrol (39.118655, -77.0226283333333)
24 1168502 2013-11-01 13:10:00 MCP 4th district, Wheaton EXCEEDING THE POSTED SPEED LIMIT OF 30 MPH TIDEWATER CT / RT 97 39.157913 -77.065333 No ... 21-801.1 Transportation Article No WHITE M BROOKEVILLE MD MD O - Foot Patrol (39.1579133333333, -77.0653333333333)

25 rows × 36 columns

In [60]:
filtered_cols = ["Date Of Stop", "Time Of Stop", "SubAgency", 
                 "Description", "Location", "Latitude", "Longitude",
                 "Violation Type", "Race", "Gender"]

# Can break up the criteria above to make the dataframe more tidy
sam = original[filtered_cols].copy()
In [61]:
# Auto Generate an empty with the location of Montgomery County Maryland
def generate_map(loc = [39.1247, -77.1905], zoom = 10.5, tile = "openstreetmap"):
    res_map = folium.Map(location = loc, zoom_start = zoom, control_scale = True, tiles = tile)
    
    # Add the Tile (or Style) of the Map
    folium.TileLayer('openstreetmap').add_to(res_map)
    folium.TileLayer('Stamen Watercolor').add_to(res_map)
    folium.TileLayer('Stamen Toner').add_to(res_map)
    return res_map
    
In [62]:
# This Function returns the designated color assigned to a race.
def color_select(race):
    ethnicity = {'ASIAN': "#ed8134", # Orange
                 'BLACK': "#391cba", #Indigo
                 'HISPANIC': "#119992", #Teal 
                 'NATIVE AMERICAN': "#9412b8", # Violet 
                 'OTHER': "#127bb8", # Blue
                 'WHITE': "#e81c1c"} # Red
    
    return ethnicity[race]

Map Visualization of Traffic Violation Based on Race and Gender

In [63]:
# Creating an Empty Map
map_total = generate_map()

# Create Different Layers for each race
asian_fg = folium.FeatureGroup(name = "Asian") 
black_fg = folium.FeatureGroup(name = "Black") 
his_fg = folium.FeatureGroup(name = "Hispanic") 
na_fg = folium.FeatureGroup(name = "Native American") 
other_fg = folium.FeatureGroup(name = "Other") 
white_fg = folium.FeatureGroup(name = "White") 

# Making a hash where the key are the race and the value are 
# the respective layer 
race = {'ASIAN': asian_fg, 
        'BLACK': black_fg, 
        'HISPANIC': his_fg, 
        'NATIVE AMERICAN': na_fg, 
        'OTHER': other_fg, 
        'WHITE': white_fg} 

for ind, row in sam.iterrows():
    entry = (folium.RegularPolygonMarker(location = [row["Latitude"],row["Longitude"]], popup = row["Description"], 
                                        color= color_select(row["Race"]), fill = True, weight = 1, 
                                        number_of_sides = 3 if row["Gender"] == "M" else 4, 
                                        radius = 4, opactity = .4))
    entry.add_to(race[row["Race"]])

for r in race:
    race[r].add_to(map_total)
    
folium.LayerControl().add_to(map_total)

map_total
Out[63]:
Legend: Race:
Asian:

White:

Black:

Hispanic:

Native American:

Other:


Gender:
Male:

Female:

</div>

In [64]:
gr_df = sam.copy()
gr_df["count"] = 1

aggregation_functions = {'count': 'sum'}
nd = gr_df.groupby(['Gender', 'Race']).aggregate(aggregation_functions)

# Setting up the plot and dimension
fig, axs = plt.subplots() 
fig.set_figheight(30)
fig.set_figwidth(40)

b1 = sns.barplot(x="Gender", y ="count", hue="Race", palette = "Spectral", data=nd.reset_index(), ax = axs)
b1.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05),
          fancybox=True, shadow=True, ncol=3, labelspacing=2, fontsize = 20)

b1.set_title("The Occurrence of Traffic Violation Based on Gender and Race", fontsize = 40)
b1.set_ylabel("Count", fontsize = 30)
b1.set_xlabel("Gender", fontsize = 30)
b1.tick_params(axis='both', labelsize=25)

plt.show()

Heatmap Exploring the Occurrence based on the Hours

In [65]:
sam["hour"] = [t.hour for t in sam["Time Of Stop"]]
cut = pd.cut(sam["hour"], bins = [0,2,4,6,8,10,12,14,16,18,20,22,24], 
             labels = [1,2,3,4,5,6,7,8,9,10,11,12], right = False)
sam["cut"] = cut
In [66]:
df_copy = sam.copy()
df_copy['count'] = 1
hr_map = generate_map()

hm_fg = []
hr = 0
for ind in range(12):
    temp_name = "Hours " + str(hr) + " to " + str(hr + 1)
    hm_fg.append(folium.FeatureGroup(name = temp_name, show= True if ind == 0 else False))
    hr += 2


# Group time together to have more during a specifc set of hours\
for index in range(12):    
    temp = df_copy[df_copy["cut"] == index + 1]
    HeatMap(data=temp[['Latitude', 'Longitude', 'count']]
                .groupby(['Latitude', 'Longitude', 'count'])
                .sum()
                .reset_index()
                .values.tolist(), 
                radius=8, max_zoom=13).add_to(hm_fg[index])
    
for fg in hm_fg:
    fg.add_to(hr_map)
        
    
folium.LayerControl().add_to(hr_map)

hr_map
Out[66]:

You can filter what time the heatmap is showing using the layer tool at the top right corner of the map.

In [67]:
time_map = generate_map()
df_hour_list = []
for hour in df_copy["cut"].sort_values().unique():
    df_hour_list.append(df_copy.loc[df_copy.hour == hour, ['Latitude', 'Longitude', 'count']]
                        .groupby(['Latitude', 'Longitude']).sum().reset_index().values.tolist())

HeatMapWithTime(df_hour_list, radius=8, gradient={0.2: 'blue', 0.4: 'lime', 0.6: 'orange', 1: 'red'}, 
                min_opacity=0.5, max_opacity=0.8, use_local_extrema=True, auto_play=True).add_to(time_map)

folium.LayerControl().add_to(time_map)

time_map
Out[67]:

Heatmap of All of the Traffic Violation

In [68]:
df_copy = sam.copy()
df_copy['count'] = 1
base_map = generate_map()

HeatMap(data=df_copy[['Latitude', 'Longitude', 'count']]
            .groupby(['Latitude', 'Longitude', 'count'])
            .sum()
            .reset_index()
            .values.tolist(), 
            radius=8, max_zoom=13).add_to(base_map)

folium.LayerControl().add_to(base_map)

base_map
Out[68]:

Exploring Race and The Violation Type

In [69]:
rv = sam.copy()
rv["count"] = 1

aggregation_functions = {'count': 'sum'}
nd = rv.groupby(['Race', 'Violation Type']).aggregate(aggregation_functions)

# Setting up the plot and dimension
fig, axs = plt.subplots() 
fig.set_figheight(30)
fig.set_figwidth(40)

r1 = sns.barplot(x="Race", y ="count", hue="Violation Type", palette = ["#ff8378", "#5bc7a7"], data=nd.reset_index(), ax = axs)
r1.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05),
          fancybox=True, shadow=True, ncol=3, labelspacing=2, fontsize = 20)

r1.set_title("The Occurrence of Traffic Violation and Violation Type Based on Race", fontsize = 40)
r1.set_ylabel("Count", fontsize = 30)
r1.set_xlabel("Race", fontsize = 30)
r1.tick_params(axis='both', labelsize=25)

plt.show()

Exploring Gender and Violation Type

In [70]:
gv = sam.copy()
gv["count"] = 1

aggregation_functions = {'count': 'sum'}
nd = gv.groupby(['Gender', 'Violation Type']).aggregate(aggregation_functions)

# Setting up the plot and dimension
fig, axs = plt.subplots() 
fig.set_figheight(30)
fig.set_figwidth(40)

g1 = sns.barplot(x="Gender", y ="count", hue="Violation Type", palette = ["#ff8378", "#5bc7a7"], data=nd.reset_index(), ax = axs)
g1.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05),
          fancybox=True, shadow=True, ncol=3, labelspacing=2, fontsize = 20)

g1.set_title("The Occurrence of Traffic Violation and Violation Type Based on Gender", fontsize = 40)
g1.set_ylabel("Count", fontsize = 30)
g1.set_xlabel("Gender", fontsize = 30)
g1.tick_params(axis='both', labelsize=25)

plt.show()

Can We Predict What Violation Type Someone Can Get?


In [71]:
data_reg = sam.copy()

vt = {"Warning": 1,
      "Citation": 2,
      "ESERO": 3,
      "SERO": 4}

data_reg["violation_type_num"] = [vt[v] for v in data_reg["Violation Type"]]
data_reg = pd.get_dummies(data_reg, columns = ["Gender"])
data_reg = pd.get_dummies(data_reg, columns = ["Race"])
data_reg["Race_NATIVE"] = data_reg["Race_NATIVE AMERICAN"] # Rename the column Race_NATIVE AMERICAN to Race_NATIVE

data_reg.head()
Out[71]:
Date Of Stop Time Of Stop SubAgency Description Location Latitude Longitude Violation Type hour cut violation_type_num Gender_F Gender_M Race_ASIAN Race_BLACK Race_HISPANIC Race_NATIVE AMERICAN Race_OTHER Race_WHITE Race_NATIVE
0 2016-05-01 23:08:00 4th district, Wheaton DRIVER FAIL TO STOP AT RED TRAFFIC SIGNAL BEFO... GEORGIA AVE AT GLENALLAN AVE 39.063522 -77.055263 Citation 23 12 2 0 1 0 0 1 0 0 0 0
1 2017-11-30 01:14:00 3rd district, Silver Spring DRIVING VEHICLE IN EXCESS OF REASONABLE AND PR... RANDOLPH ROAD AT TAMARACK RD 39.067270 -76.984982 Citation 1 1 2 0 1 0 1 0 0 0 0 0
2 2012-05-24 10:48:00 3rd district, Silver Spring DRIVER USING HANDS TO USE HANDHELD TELEPHONE W... WAYNE AVE / COLESVILLE RD, W/B 38.995165 -77.031199 Citation 10 6 2 1 0 0 0 1 0 0 0 0
3 2012-10-18 03:30:00 3rd district, Silver Spring DRIVING TO DRIVE MOTOR VEHICLE ON HIGHWAY WITH... COLUMBIA PIKE AT LORRAIN AVE 38.998501 -77.026377 Citation 3 2 2 0 1 0 1 0 0 0 0 0
4 2012-02-16 07:47:00 6th district, Gaithersburg / Montgomery Village EXCEEDING MAXIMUM SPEED: 34 MPH IN A POSTED 25... GAME PRESERVE RD N/B (11400 BLOCK) 39.157287 -77.239419 Citation 7 4 2 1 0 0 0 1 0 0 0 0
In [72]:
distlr = sm.ols(formula = 'violation_type_num ~ hour + Race_ASIAN + Race_BLACK + + Race_WHITE + Race_HISPANIC + Race_OTHER + Race_NATIVE + Gender_F + Gender_M', data = data_reg).fit()   

distlr.summary()
Out[72]:
OLS Regression Results
Dep. Variable: violation_type_num R-squared: 0.022
Model: OLS Adj. R-squared: 0.020
Method: Least Squares F-statistic: 15.66
Date: Mon, 16 Dec 2019 Prob (F-statistic): 1.91e-20
Time: 06:53:07 Log-Likelihood: -3559.3
No. Observations: 4984 AIC: 7135.
Df Residuals: 4976 BIC: 7187.
Df Model: 7
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 0.9055 0.019 48.342 0.000 0.869 0.942
hour -0.0058 0.001 -5.846 0.000 -0.008 -0.004
Race_ASIAN 0.0812 0.035 2.291 0.022 0.012 0.151
Race_BLACK 0.2167 0.028 7.747 0.000 0.162 0.272
Race_WHITE 0.1632 0.028 5.926 0.000 0.109 0.217
Race_HISPANIC 0.2607 0.029 9.011 0.000 0.204 0.317
Race_OTHER 0.0773 0.037 2.114 0.035 0.006 0.149
Race_NATIVE 0.1064 0.140 0.758 0.448 -0.169 0.382
Gender_F 0.4277 0.013 34.122 0.000 0.403 0.452
Gender_M 0.4779 0.011 41.863 0.000 0.455 0.500
Omnibus: 18321.544 Durbin-Watson: 2.000
Prob(Omnibus): 0.000 Jarque-Bera (JB): 761.490
Skew: 0.075 Prob(JB): 4.41e-166
Kurtosis: 1.091 Cond. No. 1.01e+17


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.09e-28. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
In [73]:
# Setting up the plot and dimension
fig, axs = plt.subplots(nrows = 1)
fig.set_figheight(10)
fig.set_figwidth(20)

predict = distlr.predict({"hour": data_reg["hour"],"Gender_F": data_reg['Gender_F'], 
                          "Gender_M": data_reg['Gender_M'], "Race_ASIAN": data_reg['Race_ASIAN'],
                          "Race_BLACK": data_reg['Race_BLACK'], "Race_WHITE": data_reg['Race_WHITE'],
                          "Race_HISPANIC": data_reg['Race_HISPANIC'], "Race_OTHER": data_reg['Race_OTHER'],
                          "Race_NATIVE": data_reg['Race_NATIVE']})

resid = data_reg["violation_type_num"] - predict
d1 = sns.violinplot(x = data_reg["hour"], y = resid, ax = axs)
d1.set_title("Violin Plot of Residuals vs. Hour for the Multiple Linear Regression Model", fontsize = 20)
d1.set_ylabel("Residual", fontsize = 15)
d1.set_xlabel("Hour", fontsize = 15)
d1.tick_params(axis='both', labelsize=15)

plt.show()

In our model, it predicted value between 1 and 2 (Technically also 3 for ESERO and 4 for SERO, but they are very rare), meaning that it can predict a decimal value such as 1.5. This does not make sense because there are only two options, either 1 for warning or 2 for citation. In order to make sense of the prediction, we rounded any predictions less than 1.5 to 1 and any predictions greater than or equal to 1.5 to 2.

In [74]:
rounded = []
for p in predict:
    if p < 1.5:
        rounded.append(1)
    elif p < 2.5: 
        rounded.append(2)
    elif p < 3.5:
        rounded.append(3)
    else:
        rounded.append(4)
        
# Setting up the plot and dimension
fig, axs = plt.subplots(nrows = 1)
fig.set_figheight(10)
fig.set_figwidth(20)        

resid = data_reg["violation_type_num"] - rounded
d2 = sns.violinplot(x = data_reg["hour"], y = resid, ax = axs)
d2.set_title("Violin Plot of Residuals vs. Hours for the Multiple Linear Regression Model", fontsize = 20)
d2.set_ylabel("Residual", fontsize = 15)
d2.set_xlabel("Hour", fontsize = 15)
d2.tick_params(axis='both', labelsize=15)

plt.show()

In our violinplot of the residuals vs hours for the multiple linear regression model, a residual value of 0 means that our model predicted the correct violation type and a value of 1 or -1 means that our model predicted wrong.

In [75]:
# Setting up the plot and dimension
fig, axs = plt.subplots(nrows = 1)
fig.set_figheight(10)
fig.set_figwidth(20)        

d2 = sns.violinplot(x = sam["Race"], y = resid, ax = axs)
d2.set_title("Violin Plot of Residuals vs. Race for the Multiple Linear Regression Model", fontsize = 20)
d2.set_ylabel("Residual", fontsize = 15)
d2.set_xlabel("Race", fontsize = 15)
d2.tick_params(axis='both', labelsize=15)

plt.show()
In [76]:
# Setting up the plot and dimension
fig, axs = plt.subplots(nrows = 1)
fig.set_figheight(10)
fig.set_figwidth(20)        

d2 = sns.violinplot(x = sam["Gender"], y = resid, palette = "coolwarm",ax = axs)
d2.set_title("Violin Plot of Residuals vs. Gender for the Multiple Linear Regression Model", fontsize = 20)
d2.set_ylabel("Residual", fontsize = 15)
d2.set_xlabel("Gender", fontsize = 15)
d2.tick_params(axis='both', labelsize=15)

plt.show()

Conclusion:

In [ ]:
 

Drive Safe Out There!